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Abstract 

Background: Common bean {Phaseolus vulgaris L.) is one of the most important legumes in the world. Several 
diseases severely reduce bean production and quality; therefore, it is very important to better understand disease 
resistance in common bean in order to prevent these losses. More than 70 resistance (ft) genes which confer 
resistance against various pathogens have been cloned from diverse plant species. Most ft genes share highly 
conserved domains which facilitates the identification of new candidate ft genes from the same species or other 
species. The goals of this study were to isolate expressed ft gene-like sequences (RGLs) from 454-derived 
transcriptomic sequences and expressed sequence tags (ESTs) of common bean, and to develop RGL-tagged 
molecular markers. 

Results: A data-mining approach was used to identify tentative P. vulgaris ft gene-like sequences from 
approximately 1.69 million 454-derived sequences and 116,716 ESTs deposited in GenBank. A total of 365 non- 
redundant sequences were identified and named as common bean (P. vulgaris = Pv) resistance gene-like 
sequences (PvRGLs). Among the identified PvRGLs, about 60% (218 PvRGLs) were from 454-derived sequences. 
Reverse transcriptase-polymerase chain reaction (RT-PCR) analysis confirmed that PvRGLs were actually expressed in 
the leaves of common bean. Upon comparison to P. vulgaris genomic sequences, 105 (28.77%) of the 365 tentative 
PvRGLs could be integrated into the existing common bean physical map. Based on the syntenic blocks between 
common bean and soybean, 237 (64.93%) PvRGLs were anchored on the P. vulgaris genetic map and will need to 
be mapped to determine order. In addition, 11 sequence-tagged-site (STS) and 19 cleaved amplified polymorphic 
sequence (CAPS) molecular markers were developed for 25 unique PvRGLs. 

Conclusions: In total, 365 PvRGLs were successfully identified from 454-derived transcriptomic sequences and ESTs 
available in GenBank and about 65% of PvRGLs were integrated into the common bean genetic map. A total of 30 
RGL-tagged markers were developed for 25 unique PvRGLs, including 11 STS and 19 CAPS markers. The expressed 
PvRGLs identified in this study provide a large sequence resource for development of RGL-tagged markers that 
could be used further for genetic mapping of disease resistant candidate genes and quantitative trait locus/loci 
(QTLs). This work also represents an additional method for identifying expressed RGLs from next generation 
sequencing data. 
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Background 

Plant resistance genes (R genes) play an important role in 
direct or indirect recognition of proteins encoded by speci- 
fic avirulence (Avr) genes of pathogens [1]. In recent years, 
a large number of R genes which confer resistance to 
diverse pathogens have been cloned from different plant 
species by using either map-based cloning or transposon 
tagging approaches [2-5]. Based on the amino acid 
sequence comparison among cloned R genes, several 
highly conserved domains, such as NBS, LRR, protein 
kinase (PK), transmembrane domain (TM), and TIR were 
identified as the majority of known R genes [6-9]. These 
distinct conserved domains provide scientists a convenient 
way to identify and then clone additional R genes and 
RGLs or resistance gene analogs (RGAs). 

The largest class consists of R genes which contain NBS 
and LRR domains (NBS-LRR). This class can be further 
divided into two sub-classes, TIR-NBS-LRR and non-TIR- 
NBS-LRR, based on the presence or absence of a TIR 
domain at the N-terminus [7,10]. The second class of R 
genes is comprised of those with LRR and PK domains 
(LRR-PK), such as Fls2 in Arabidopsis and Xa21 in rice 
[11,12], The third class contains those with a large extra- 
cellular LRR domain, for example, Cf-4 in tomato and Vfl 
in apple [13,14]. The fourth class contains those only with 
the PK domain, such as Pto in tomato and Pbsl in Arabi- 
dopsis [15,16]. And finally, the fifth class includes all other 
R genes that confer resistance through different mechan- 
isms, for example, Hml in maize and mlo in barley [3,17]. 

Based on conserved domains of known R genes, PCR 
amplification has been successfully used to isolate RGLs 
or RGAs from Arabidopsis [18,19], soybean [20-22], rice 
[23], maize [24,25], wheat [26], and many other plant spe- 
cies [27-30]. For example, Bertioli et al. isolated 78 RGAs 
from cultivated and wild peanuts by using degenerate pri- 
mers designed on NBS conserved domains [31]. Leal-Ber- 
tioli et al. identified candidate genome regions for disease 
resistances by using a combination of NBS profiling, 
sequence characterized amplified region (SCAR) markers, 
and Southern blot by probes with NBS encoding regions 
[32]. In common bean, eight classes of disease resistance 
related sequences and 15 RGAs have been isolated based 
on the conserved NBS and TIR domains [33,34]. An RGA, 
SB1, was found to cosegregate with the Crg gene which is 
known to be responsible for resistance to the rust patho- 
gen Uromyces appendiculatus [35]. 

Recently, data-mining approaches have been employed 
to identify RGLs and RGAs from plant EST databases. In 
maize, 199 RGLs have been identified using data-mining, 
modified amplified fragment length polymorphism 
(AFLP), and rapid amplification of cDNA ends (RACE) 
methods [25]. In sugarcane, Rossi et al. identified 88 R- 
gene-like ESTs via the data-mining method [36]. In 
wheat, Dilbirligi and Gill found 220 expressed 7?-gene 



candidates using modified RNA fingerprinting and data- 
mining approaches [37]. In Medicago, Zhu et al. have 
revealed 179 unique NBS-LRR RGLs by using combina- 
tion methods of database queries, hybridization with 
RGLs from related species, and PCR [38]. With the avail- 
ability of several complete and nearly complete plant 
genome sequences, R gene analyses at the whole genome 
level have been carried out in Arabidopsis [10], rice [39], 
Medicago [40], and Lotus [41]. As a result, 149 NBS- 
LRR-encoding genes and an additional 58 related genes 
have been identified in the Arabidopsis genome [10]. 
Also 489 predicted NBS-LRR genes were identified in the 
rice genome, all of which belong to the non-TIR-NBS- 
LRR class [39]. From the Medicago truncatula draft gen- 
ome (Mtl.0), Ameline-Torregrosa et al. have identified 
333 non-redundant NBS-LRRs and predicted 400 to 500 
NBS-LRRs in the full genome [40]. In the Lotus genome, 
158 NBS-encoding sequences have been isolated and 
nearly 40% of them are pseudogenes encoding incom- 
plete protein sequences [41]. With advances in sequen- 
cing technologies over the past few years, next- 
generation sequencing technologies have become widely 
available and have dramatically accelerated biological 
research [42], making identification of R gene homologs 
and members of other related gene families by data- 
mining methods much more efficient. 

Common bean (Phaseolus vulgaris L.) is one of the most 
important legume crops in the world. Due to its high pro- 
tein and low fat content, it has become a principal source 
of dietary protein and fiber in many developing countries, 
especially those in Eastern and Southern Africa and Latin 
America [43]. Diseases such as anthracnose, angular leaf 
spot (ALS), rust, common bacterial blight (CBB), and Bean 
golden yellow mosaic virus (BGYMV) highly reduce bean 
production and quality. For example, CBB is a serious 
seed-borne disease that can cause over 40% yield loss in 
both temperate and tropical bean production regions [44] . 
Rust can cause 18% to 100% yield loss of susceptible bean 
genotypes under favorable environmental conditions [45]. 
Therefore, it is very important to better understand dis- 
ease resistance in common bean. 

The objective of this study was to identify expressed 
RGLs from common bean and to develop RGL-tagged 
molecular markers for genetic mapping of candidate genes 
and QTLs for disease resistance. 

Results 

Identification of RGLs from 454 sequences and ESTs in 
GenBank 

More than 1.69 million sequences with 349,977,079 base 
pairs (SRA028837) were generated by 454 sequencing 
from an adapter-ligated normalized cDNA collection from 
four different plant organs (leaves, flowers, roots, and 
pods) of common bean. After clustering and assembling 
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using the Newbler software from Roche (Branford, CT), 
these sequences were assembled into 39,572 contigs and 
19,723 singletons with sequence size more than 100 base 
pairs [46]. Moreover, 116,716 common bean ESTs are 
available in GenBank (verified on July 7, 2011). Therefore, 
a total of 176,011 sequences were used for identification of 
RGLs. 

Full length protein sequences of the 50 known R genes 
(Table 1) were used to perform tBLASTn searches against 
454-derived sequences and common bean ESTs in 
GenBank. A total of 259 contigs and nine singletons from 
454-derived sequences and 629 common bean ESTs were 
identified with scores more than or equal to 100 and E 
values less than or equal to le-10. After clustering, 365 
non-redundant sequences were developed and considered 
as tentative P. vulgaris RGLs, named as PvRGLs (See 
Additional file 1: sequences of 365 tentative PvRGLs). The 
average size of PvRGLs was about 1,110 base pairs, 
although some fragments were 2,500 base pairs or longer. 
Among these 365 tentative PvRGLs, 166 (45.48%) PvRGLs 
were from 454-derived sequences, 147 (40.27%) were from 
common bean ESTs, and 52 (14.25%) PvRGLs were 



present in both of the groups (See Additional file 2: Char- 
acterization of 365 tentative PvRGLs). 

In addition, all tentative PvRGLs were used to identify 
their putative functions by BLASTX searches against the 
GenBank databases. Among the 365 tentative PvRGLs, 29 
belonged to the NBS-LRR class, 96 belonged to the LRR, 
LRR-TM, and LRR-PK classes, 229 belonged to the PI< 
class, six were similar to R gene reated sequences with 
putative TM domains, and five showed high similarity to 
R genes with Toxin reductase domains (See Additional 
file 2: Characterization of 365 tentative PvRGLs). 

Validation of expression of selected RGLs 

In order to determine whether the identified PvRGLs 
from common bean were expressed, cDNA was prepared 
from two-week old leaves of the genotype Sierra grown 
in the greenhouse. RT-PCR was then used to amplify 
with primers specific to each of the 27 selected PvRGLs. 
Twenty four of the 27 primer pairs produced a single 
product and we showed amplification of 15 of these pri- 
mer pairs (Figure 1). Furthermore, to ensure that the 
amplification products actually amplified from expressed 



Table 1 Fifty known R genes from plants used in this study [2] 



Plant 


R genes 


Protein ID 


Structure 


Plant 


R genes 


Protein ID 


Structure 


Apple 


Vfl 


CAC40825 


LRR 


Potato 


Rl 


AAU95638 


NBS-LRR 




Fls2 


BAB11088 


LRR-PK 




Rgcl 


AAF76163 


NBS-LRR 




Pbsl 


ABR46085 


PK 




Rx 


CAB50786 


NBS-LRR 


Arabidopsis 


Rpml 


CAA61 131 


NBS-LRR 


Rice 


Pib 


BAA76282 


NBS-LRR 




Rppl 


AAC72977 


NBS-LRR 




Pi-ta 


BAF91352 


NBS-LRR 




Rpp4 


AAM 18462 


NBS-LRR 




Rprl 


BAA75812 


NBS-LRR 




Rpp5 


AAF08790 


NBS-LRR 




Xa1 


BAA25068 


NBS-LRR 




Rpp8 


AAC78631 


NBS-LRR 




Xa21 


AAC80225 


LRR-PK 




Rppl 3 


AAF42832 


NBS-LRR 




Xa26 


ABD36512 


LRR-PK 




Rps2 


AAM90883 


NBS-LRR 


Sugarbeet 


Hsl 


AAW03319 


LRR-TM 




Rps4 


CAB50708 


NBS-LRR 


Tobacco 


N 


AAA50763 


NBS-LRR 




Rps5 


AAC261 26 


NBS-LRR 




Cf-2 


AAC15779 


LRR-TM 




Rpw8 


ACJ05900 


TM 




Cf-4 


CAA05268 


LRR-TM 




Ssi4 


AAN86124 


NBS-LRR 




as 


AAC78591 


LRR-TM 




Rcy1 


AAM13905 


NBS-LRR 




Cf-9 


CAA05274 


LRR-TM 


Barley 


Rpgl 


ABK51312 


PK 


Tomato 


Hero 


CAD29728 


NBS-LRR 




Mlal 


AAG37354 


NBS-LRR 




12 


AAD27815 


NBS-LRR 




Mla6 


CAC29242 


NBS-LRR 




Mi-I 


AAC67238 


NBS-LRR 




Mb 


CAB06083 


TM 




Prf 


AAC49408 


NBS-LRR 


Flax 


16 


AAA91022 


NBS-LRR 




Pto 


AAB47421 


PK 




M 


AAB47618 


NBS-LRR 




SwS 


ACQ42910 


NBS-LRR 




P2 


AAK28806 


NBS-LRR 




Cre3 


CAD12795 


NBS-LRR 


Maize 


Hml 


AAC04333 


Toxin reductase 


Wheat 


VRGL 1 


AAF19148 


NBS-LRR 




Rpl-d 


AAD47197 


NBS-LRR 




YrlO 


AAG42168 


NBS-LRR 


Lettuce 


Dm3 


AAD03156 


NBS-LRR 


Pepper 


Bs2 


AAF09256 


NBS-LRR 
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Figure 1 Validation of fifteen selected tentative PvRGLs by RT-PCR. Confirmation of absence of DNA contamination is shown in lanes 1-4 
where RT-PCR amplification was carried out with primers designed from contigl 1286 in lanes with genomic DNA, leaf cDNA, leaf cDNA control 
(no reverse transcriptase added to reverse transcription reaction), and water as template to check DNA contamination. In lanes 5-18, RT-PCR 
products derived by amplification from an additional 14 common bean RGLs using leaf cDNA as a template are shown. M: 50 bp ladder. 



mRNA and not from contaminating DNA, we used geno- 
mic DNA, cDNA, cDNA control (no reverse transcrip- 
tase added to reverse transcription reaction), and water 
as templates to perform RT-PCR and the results showed 
that only DNA and cDNA have specific products, while 
cDNA control and water have no products (Figure 1). 
Based on these results, we concluded that there was no 
DNA contamination in the cDNA and our PvRGLs were 
indeed expressed. 

Mapping identified PvRGLs to the P. vulgaris genome 

Mapping of the identified PvRGLs is important for isola- 
tion of candidates of a particular resistance gene or quanti- 
tative trait locus (QTL). Recently, a Finger Printed 
Contigs (FPC) physical map has been established for com- 
mon bean http://phaseolus.genomics.purdue.edu/ with 
about 9X coverage of the genome [47] . The common bean 
FPCs makes it possible to integrate some of the PvRGLs 
into the physical map by comparison to BAC-end 
sequences [45]. BLASTn search was used to determine 
those PvRGLs matching to a BAC-end sequence with the 
criteria of scores more than or equal to 100 and E values 
less than or equal to le-50. As a result, 105 (28.77%) of 
the 365 tentative PvRGLs matched to BAC-end sequences 
with minimum criteria and could be integrated into the 
physical map (see Additional file 3: list of 105 tentative 
PvRGLs having matches to a BAC-end sequences). 

Interestingly, there were five cases of two PvRGLs and 
three cases of three PvRGLs located to one physical contig: 
for example, PvRGL093 and PvRGL233 were located in the 
P. vulgaris physical map FPC_Contigl3 while PvRGL266, 
PvRGL275 and PvRGL365 were located in the P. vulgaris 
physical map FPC_Contigl37. Surprisingly, PvRGL083 and 
PvRGL236 have a match to different regions of the same 
BAC-end sequence PV_GBa0056Bll.r (EI454956). Another 
seven PvRGLs matched to a Genome Survey Sequence 



(GSS) with E value less than le-50 from other common 
bean genotypes in GenBank. 

Mapping PvRGLs to G. max pseudomolecules and P. 
vulgaris genetic map 

In order to take advantage of the synteny blocks between 
common bean and soybean as per McClean et al. [48], we 
used the BLASTn algorithm to search against soybean gen- 
ome with the same criteria as McClean et al. [48]. Given 
that nearly every common bean locus can match to two 
soybean loci, the best two soybean hits were selected. Out 
of 365 PvRGLs, 360 (98.63%) PvRGLs could be mapped to 
the G. max genome pseudomolecules. Among the 360 
PvRGLs, 343 (95.28%) have soybean hits with E value less 
than le-50. The 360 PvRGLs are broadly distributed 
among the 20 G. max pseudomolecules. Among the 713 
soybean loci matched to 360 PvRGLs (7 PvRGLs with only 
one hit from soybean), 689 (96.63%) have transcripts. The 
annotation of these transcripts indicates most of them 
show high similarity to sequences containing NBS, LRR, 
PK, TM and toxin reductase domains (See Additional file 4: 
Mapping PvRGLs to G. max pseudomolecules and P. vul- 
garis genetic map). Based on the comparison of PvRGLs to 
the soybean genome sequences and the conserved blocks 
between common bean and soybean [48], a total of 237 
(64.93%) PvRGLs can be located in the P. vulgaris genetic 
map. The PvRGLs are broadly distributed among the 11 
P. vulgaris linkage groups, but the distribution was not 
even. P. vulgaris chromosome 8 (Pv8) contained the largest 
number of PvRGLs, 41; while Pv4 contained the least num- 
ber of PvRGLs, 9 (See Additional file 4: Mapping PvRGLs 
to G. max pseudomolecules and P. vulgaris genetic map). 

RGL-tagged molecular markers 

Ninety primer pairs were designed based on the 105 
PvRGLs matching to BAC-end sequences. Eighty two of 
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ninety primer pairs (91.11%) have specific products 
among the six genotypes. All of the PCR products were 
sequenced three times from both the 5' and 3' ends to 
ensure polymorphisms were not due to sequencing 
errors. The average fragment size was about 590 base 
pairs, while the range was from 309 to 1,549 base pairs. 
Approximately 291 kb of sequence data was collected 
from six genotypes for sequence analysis. Sequence 
alignments indicated single nucleotide polymorphisms 
(SNPs) and small insertions/deletions (InDels) occurred 
frequently. Twenty three SNPs were related to restric- 
tion enzyme sites, and 11 small InDels were identified. 

Based on small InDels, 11 RGL-tagged STS markers 
gave rise to distinct and polymorphic PCR bands among 
six genotypes (Table 2). Likewise, 19 RGL-tagged CAPS 
markers have been obtained from those SNPs related to 
restriction sites (Table 3), and each CAPS marker showed 
unambiguous polymorphic bands following digestion with 
its corresponding restriction endonuclease. 

Discussion 

As a component of the gene-for-gene resistance mechan- 
ism in plants, resistance genes play an important role in 
recognizing products encoded by specific avirulence genes 
of a pathogen [49] . In this study, 365 tentative RGLs from 
common bean were successfully identified by data-mining 
based on the availability of 454-derived sequences in our 
lab and common bean ESTs in GenBank. About 60% 
(218) of identified PvRGLs were from 454-derived 
sequences; moreover, 166 (76.15%) of the 218 PvRGLs 
were new transcripts. 

ESTs are highly valuable for genome annotation and 
gene structure prediction [50]. The 454 sequencing is a fas- 
ter and more cost-effective method of producing sequence 
data than the Sanger method and is capable of producing a 
400 to 600 million base pairs per run with 400 to 500 base 
pair read length [51]. It has been successfully used to 



maximize gene discovery, improve gene predictions, and 
detect SNPs and mutations [52,53]. In the past few years, 
data-mining approaches have been successfully used to iso- 
late RGLs or RGAs from sugarcane [36], wheat [37] and 
maize [25]. As reported previously, Dilbirligi and Gill [37] 
adopted four different data-mining methods including 
domain search, individual and multiple motif searches, 
consensus sequence search, and individual full-length 
search to mine 7?-gene-like wheat sequences, and showed 
that the individual full-length search was the most success- 
ful method. There were 243 NBS-LRRs in addition to 101 
other types of expressed 7?-gene candidates which were 
then isolated via an individual full-length search using a 
low E value of e-1 [37]. Xiao et al. [25] used three methods 
including modified AFLP, RACE and data-mining to isolate 
RGAs and 7?-gene-like ESTs from maize and found that the 
data-mining method is the most efficient way. A total of 
186 expressed RGAs were recovered from 550,000 maize 
ESTs using a moderate E value of e-10 or better [25]. Rossi 
et al. revealed 88 RGLs from sugarcane ESTs by using a 
very stringent E value of e-50 or better and represented 
three major classes of R genes, namely NBS-LRR, LRR-TM 
and PK [36] . The above three research reports showed that 
different E values have a great effect on the number of 
resulting RGA or RGLs. In the present study, the moderate 
E value of e-10, similar to that in maize, was used to mine 
common bean RGLs and a total of 365 tentative PvRGLs 
were identified. Of the 365 tentative PvRGLs, 29 belonged 
to NBS-LRR type, 96 belonged to LRR, LRR-TM, and 
LRR-PK type, 229 belonged to PK type, six and five con- 
tained sequences with similar to putative TM domains and 
Toxin reductase domains, respectively. The number of 
RGLs identified in the present study was about two times 
more than those in maize due to three reasons. Firsdy, 1.77 
million 454-derived sequences and common bean ESTs 
were screened to identify RGLs in common bean, about 
three times more EST sequences than those in maize 



Table 2 Primer sequences and sizes of InDels for the RGL-tagged STS markers 



PvRGL 


Forward primer 


Reverse primer 


InDel Sizes (bp)* 


PvRGLOl 1 


GAGGTCATTCACATTATCGGTTT 


TTCAAGTCACCACTGGCAAAG 


14/G 


PvRGL016 


TGCACCATGACTGTTCTCCACC 


AATACCACCTCCAGCCCAAGA 


6/G 


PvRGL041 


GCCTCCGTTTGAAAGTTGCTC 


CACGAAATATCTTACGTTTGTITGC 


7/G 


PvRGL049 


CACATTAGTAGCCTTTATGTCCC 


CTTCGGTCCTGTCTACAAGGT 


1 0/S, O, B, P, A 


PvRGL084 


TGTAATCATTGACAAGGGTGGAG 


TTGCTTGAGCATTGATGTATTG 


6/G 


PvRGL153 


AGTGAT7TCCATTTATACGAGG 


AAAAGCCAAGTAACATTCTTCT 


8/S, 0, B, P, A 


PvRGL155 


GAACCAGCAATTTGTTACCGAG 


TAGTTGTTGAGCAACCTTTAGC 


6/S, O, G, B, P 


PvRGL189 


AAGGGCAGATAACACCTCATTC 


GATAGTGGTTAAAT7TCACCAAGG 


13/S, G, B, P, A 


PvRGL193 


CAGCTATAATAACTTTGGTGGGAG 


CATTTAACAATGATGCAGGGAC 


4/S, O, G, P 


PvRGL272 


CATCAGTCACGTTGAAGGAAGG 


TATGTCCCAACAATTCTTCCCG 


3/G 



PvRGL297 TTGGAGCAATGAGAAAATAGGGG TCCAGCAGAAGCCCTGTGATG 3/P 



*The genotypes shown in this column have insertions with the corresponding InDel size 
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Table 3 Primer sequences and restriction enzymes for the RGL-tagged CAPS markers 



PvRGL 


Forward primer 


Reverse primer 


Restriction enzymes* 


rvKGLU Id 


tc c a cc atc a r — rr — rtrtrr a rr 
I GGAGGA I GAG I G I I G I GGAGG 


A AT A VV A Ct — l"VV A (~ f~(~C A A f A 

AA 1 AGGAGG 1 GGAGGGGAAGA 


rlindlll/G 


rvKGLlby 


f — T A r — VfffTf A A A f f A A Tf A ff 

G 1 AG 1 GGG 1 GAAAGGAA 1 GAGG 


tv avaaa/ i i i i i r I — vr~ a a tv v 

1 GAGAAAG 1 1 1 1 IGGIGAAIGG 


Pw-, 1 /D A 

Ural/D, A 


rvKGI_(J4l) 


TV — r~r a a a c c a c c c a c r — facacc 

1 G II AAAGGAGGGAGG 1 AGAGG 


ATrrrr/ — rrv a ac a f~c i — rr a t 
A 1 1 GGGG 1 1 GAAGAGGG 1 1 A 1 


SnaBI/G 


n. ,nc i r\ yi 1 

rvKGI_(J4 I 


c ct — rcct — i i i c aaa t — VTf~ ( — rv 
GGG 1 GGG 1 1 1 GAAAG 1 1 GG 1 G 


f~ A f~f~ A A A T A Tf — TT A f~ f — I I I 1 1 " 1 " 1 V f~ 

GAGGAAA 1 A 1 G 1 1 AGG 1 1 1 G 1 1 1 GG 


Agel/G 


rvKGL(J4y 


C A A C A ATf — FT A f — T A ( — T A f — VTCC CC 

GAAGAA 1 G 1 1 AG 1 AG 1 AG 1 1 GGGG 


f~ C( — TTVf — TV — X~Tf~ A C A ( — TA ( — ITT (~ 

GGG 1 1 GG 1 G 1 1 GAGAG 1 AG 1 1 1 G 


D-f 1 ^ K 1 /C C\ C D D 

PtlMI/b, U, G, d, r 


rvKGLUbU 


f AT ATT A f — r A C CC 1 "1 " 1 ATI — VCCC 

GAGA 1 1 AG 1 AGGG 1 1 1 A 1 G 1 GGG 


( — X~Tf~f~t — VCf — Fr — T( — TAVA Aff — T 

G 1 1 GGG 1 GG 1 G 1 G 1 AGAAGG 1 


■ ■ : , . ^ — J ill if 

Hindi ll/tj 


n. ,nc i r 


ccr — rTArrAi' — rrc ( — vac At — ta ac ac 
GGG 1 1 AGGAG 1 1 GG 1 AGAG 1 AAGAG 


TV A A W A A A V A f~(~C A V A V A ATV 

1 GAAGGAAAGAGGGAGAGAA 1 G 


Nsil/b, U, d, r, A 


n. ,nc i 


CC A A A A TV A C A A A / — III V V V A TC 

GGAAAA 1 GAGAAAG 1 1 1 GGGA 1 G 


AC A C CCf — VCCC V ACC A A AC AC 

AGAGGGG 1 GGGGAGGAAAGAG 


Af 1 1 l/G 


rvKGL(J9J 


TCC f — TTA ( — Tf — TTf — l"V A VV ATf — T 

1 GGG 1 1 AG 1 G 1 1 G 1 GAGGA 1 G 1 


T ATf — VCCCC ATC ATCf — VCf — TA C 

1 A 1 G 1 GGGGA 1 GA 1 GG 1 GG 1 AG 


Ural/U, r 


n. ,nc i r\r\n 


v i Trrrr a Tf — ret — rrr — rv — r atc 
G 1 1 GGGGA IGIGGI IGIGIAIG 


c ccf — r atc f — rc a ccr — i i 1 1 — vac 
GGGG 1 A 1 GG 1 GAGGG 1 1 1 G 1 AG 


□glll/D, G 


rvKGLl (Jz 


TV A A CCCf — FTATrr A V A C f — FCC 

1 GAAGGGG 1 1 A 1 GGAGAGG 1 GG 


TV ATf ATC C CC AAA CCCC A AC 

1 GA 1 GA 1 GGGGAAAGGGGAAG 


Xmnl/b, U, r 


rvKGLl b5 


A t — YC A ITT VV A 1 T 1 ATA CC A VV 

AG 1 GA 1 1 1 GGA 1 1 1 A 1 AGGAGG 


A A A A CCC A A f — T A AC A TTf — TTV — T 

AAAAGGGAAG 1 AAGA 1 1 G 1 1 G 1 


Sphl/0 


PvRGL155 


GAACCAGCAATTTGTTACCGAG 


TAGTTGTTGAGCAACCTTTAGC 


Ndel/0, B, P 


PvRGL173 


GGATCATTGCCTGTATTCGAGG 


TGGCACTGTCCCTGATAAACTG 


EcoRI/G, B, A 


PvRGL246 


CAGCCACCTTGGGTTGGTAATC 


GACGCAGTTGCTAAATGGACACG 


Hindlll/S, 0, P 


PvRGL264 


GTAACTCGTGTCCTCCTCATGT 


GATACTTGAGAAGGTAAAGGCTG 


Spel/B 


PvRGL297 


TTGGAGCAATGAGAAAATAGGGG 


TCCAGCAGAAGCCGTGTGATG 


Spel/G 


PvRGL302 


CTGTTAAAGGAGGCACGTACAC 


TATTGGGGTTCAAGACGGTTAT 


SnaBI/G 


PvRGL308 


TAAGATGTGTGAATAAATATGGTG 


GAGGAAGTTGCTGTAAAGAAGT 


BspMI/G 





*The genotypes shown in this column can be digested by the corresponding restriction enzyme 



(550,000 ESTs). Secondly, 454 sequencing can generate 
more sequence data than the Sanger sequencing. There- 
fore, transcripts at extremely low levels can be detected 
[51]. Finally, some identified PvRGLs match the same R 
genes or RGLs. For example, PvRGL083 and PvRGL236 
match different regions of the same BAC-end sequence. 

RT-PCR was used to examine the expression of the 
PvRGLs in this study. Results indicated that all of the 
selected PvRGLs were actually expressed in the leaves of 
genotype Sierra (Figure 1). In contrast, many RGLs or 
RGAs amplified from genomic DNA using degenerate 
primers or mined from whole genome are not expressed 
[33]. Previously, eight classes of disease-resistance related 
sequences were amplified from common bean DNA 
using degenerate primers based on the conserved NBS 
domain [33]. Expression analysis indicated that three 
RGAs (SB1, SB3 and SB8) were not expressed [33]. In 
Lotus, 62 NBS-encoding sequences were considered as 
pseudogenes due to encoding of incomplete protein 
sequences [41]. In Arabidopsis, at least 12 NBS-LRR- 
encoding genes were predicted to be pseudogenes due to 
frame shift and nonsense mutations [10]. 

The PvRGLs discovered in this study correspond to 
most of the 25 previous common bean R genes or RGAs 
in the PRGdb database [2]. PvRGL266 and PvRGL275 
have strong hits to the P. vulgaris TL5601 disease resis- 
tance protein gene with amino acid identity of 90% and 



85%, respectively. TL5601, located in the / locus of com- 
mon bean, controls resistance to Bean Common Mosaic 
Virus [54]. PvRGL262 matched to coiled-coil NBS-LRR 
(CNL)-Bll with amino acid identity of 94%. CNL-B11 was 
mapped in the B4 7? gene cluster which contained at least 
three R genes (Co-9, Co-y, and Co-z) and QTL effective 
against anthracnose, and Bean golden yellow mosaic virus 
[55,56]. PvRGL309 was a part of polygalacturonase-inhi- 
biting protein (PGIP) gene. PGIP can inhibit fungal endo- 
polygalacturonases and is considered to be an important 
factor for plant resistance to phytopathogenic fungi [57]. 
PvRGL294 was the same as, but much longer in the 5' end 
than the previous RGL SB3 and OB9 [33]. 

So far, most of the known R genes have been cloned by 
map-based cloning and transposon tagging approaches 
[58]. Therefore, mapping RGLs and RGAs to genomes 
and/ or genetic maps is very important and will facilitate R 
gene cloning. In this study, 105 PvRGLs could be inte- 
grated into the common bean FPC physical map by com- 
parison of PvRGLs to P. vulgaris BAC-end sequences. 
Additionally, we were able to anchor 237 PvRGLs to the 
common bean genetic map by using conserved syntenic 
blocks between common bean and soybean. The PvRGLs 
are broadly, but unevenly distributed among the 11 linkage 
groups of common bean with a strong tendency of cluster- 
ing. For example, 41 PvRGLs, the largest number, were 
mapped to Pv8, while 17 of them clustered at the bottom 
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of Pv8. David et al. (2008) also found that many specific R 
genes against various pathogens cluster together: for 
example, in the B4 resistance gene cluster, 73 BAC clones 
(FI159954 - FI160067) were identified by using a NBS 
probe PRLJ1 to screen a common bean BAC library [56]. 
In the present study, PvRGL173 was anchored at the top 
of Pv4 by in silico mapping. Meanwhile, PvRGL173 shows 
high similarity (92.56% and 96.79%) with BAC-end 
sequence FI160023 (5e-153) and FI159996 (le-101), 
respectively. PvRGL173 should be located in the B4 resis- 
tance gene cluster. The B4 resistance gene cluster contains 
various 7? genes resistant to different pathogens, such as 
Colletotrichum lindemuthianum (anthracnose), Uromyces 
appendiculatus (rust), and Pseudomonas syringae pv. pha- 
seolicola (halo blight), in addition to QTLs for resistance 
to BGYMV and anthracnose; therefore further work will 
be needed to determine the role of PvRGL173. In Medi- 
cago, two superclusters of disease resistance genes were 
identified. One is located at the top of Mt3 containing 73 
CNL and 9 TNL encoding sequences. The other is located 
at the bottom of Mt6 containing 57 TNL encoding 
sequences [38] . Clustering of R genes facilitates the genetic 
variation of R genes and benefits in the evolution of new R 
genes. Bertioli et al. found that retrotransposons were 
associated with the evolution of some resistance gene clus- 
ters via analysis of the synteny among Arachis, Lotus, and 
Medicago [59]. Several other hypotheses such as duplica- 
tion, gene conversion, and unequal crossing-over have 
been proposed to elucidate R gene cluster and evolution 
[1,38]. Therefore, further studies will be needed to better 
understand common bean R gene evolution. 

Conclusions 

In this study, 365 PvRGLs were identified from 454- 
derived sequences and common bean ESTs in GenBank 
using data-mining. As a result, 105 and 237 PvRGLs were 
mapped to the P. vulgaris FPC physical map and genetic 
map, respectively. RGL-tagged markers were developed 
for 25 unique PvRGLs, including 11 STS and 19 CAPS 
markers. These methods and results will help develop 
more RGL-tagged molecular markers that can be used 
further for genetic mapping and the isolation of resistance 
genes from common bean in the future. 

Methods 

Plant material 

Plant genotypes used in this study included P. vulgaris cv. 
Sierra (S), Olathe (O), G19833 (G), Bat93 (B), Pintoll4 
(P), and Aurora (A). All of the genotypes were planted in a 
greenhouse. Leaves of each genotype were harvested from 
two-week old seedlings for DNA extraction. Additionally, 
two-week old leaves from Sierra were collected for RNA 
extraction. 



Data mining of common bean RGLs 

The amino-acid sequences of 50 known R genes (Table 1), 
covering the five major R gene classes, were used to search 
for (tBLASTn) homologues in 454-derived sequences and 
common bean ESTs in GenBank http://www.ncbi.nlm.nih. 
gov. Those sequences with scores more than or equal to 
100 and E values less than or equal to le-10 were clus- 
tered to develop unigenes, and all of the unigenes were 
considered as putative PvRGLs [25]. The resulting uni- 
genes were in turn used to search the GenBank databases 
by BLASTX to confirm their putative R gene-like func- 
tions. In addition, all tentative PvRGLs were used to iden- 
tify possible genomic sequences in the Phaseolus genome 
database http://phaseolus.genomics.purdue.edu/. 

Mapping of PvRGLs to G. max pseudomolecules and 
P. vulgaris linkage groups 

The 20 pseudomolecules of the soybean genome were 
downloaded from ftp://ftp.jgi-psf.org/pub/JGI_data/phy- 
tozome/v7.0/Gmax/[60]. BLASTn algorithm was used to 
search soybean genome as per McClean et al. [48]. 
Firstly, we selected the hits with a cutoff E value less 
than le-10 and overlaps of at least 150 bp. Secondly, the 
best two soybean hits for PvRGLs were retrieved for 
further analysis (See Additional file 4 for all of PvRGLs 
that matched to soybean with these criteria). The 
PvRGLs were mapped to common bean linkage groups 
based on the syntenic blocks between common bean 
and soybean according to McClean et al. [48] . 

RNA extraction and expression analysis 

Total RNA was isolated from two-week old leaves with 
Trizol (Invitrogen, Carlsbad, CA) according to the manu- 
facturer's instructions. Total RNA was treated with DNa- 
sel two times to remove all DNA contamination. cDNA 
synthesis was carried out from 1 ug of RNA using the 
MMLV reverse transcriptase (New England Biolabs, Ips- 
wich, MA). The RT-PCR primers were designed and 
synthesized at Integrated DNA Technologies (IDT, Cor- 
alville, IA). The RT-PCR was performed with the BIO- 
RAD PCR system (Hercules, CA) with the following 
cycling parameters: 94°C for 3 min; 35 cycles of 94°C for 
30 s, 58°C for 30 s, 72°C for 60 s, and a final elongation at 
72°C for 5 min. 

PCR amplification and sequence analysis 

DNA was isolated from the above six genotypes accord- 
ing to the methods described in Doyle and Doyle [61]. 
PCR amplification was carried out by designing forward 
and reverse primers from PvRGLs having a hit to BAC- 
end sequences. The PCR products were purified using 
QIAquick® Gel Extraction kit (Germantown, MD) and 
sequenced at the Macrogen USA Corp (Rockville, MD). 
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The software DNAMAN version 4.0 was used for 
sequence assembly and multiple sequence alignment to 
confirm either identities or similarities. 

Development of RGL-tagged molecular markers 

PvRGL sequences amplified from the above six geno- 
types were carefully analyzed by the DNAMAN program 
to identify InDels and SNPs, and those with InDels and 
SNPs were taken as candidate RGL markers. For STS 
markers, two 200-bp sequences flanking the InDel site 
were extracted and used to design PCR primers. Poly- 
morphic PCR bands from the six genotypes were 
observed on 6% polyacrylamide gels. For CAPS markers, 
polymorphic PCR bands were observed among the six 
genotypes following digestion with a particular restric- 
tion endonuclease. 

Additional material 



Additional file 1: Sequences of 365 tentative PvRGLs. 

Additional file 2: Characterization of 365 tentative PvRGLs 

Additional file 3: 105 PvRGLs having a hit with a BAC-end 
sequence. 

Additional file 4: Mapping PvRGLs to G. max pseudomolecules and 
P. vulgaris genetic map 
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